Power/performance Advantages of Victim Buuer in High-performance Processors

نویسندگان

  • Gianluca Albera
  • R. Iris Bahar
چکیده

In this paper, we propose several diierent data cache conngurations and analyze their power as well as performance implications on the processor. Unlike most existing work in low power microprocessor design, we explore a high performance processor with the latest innovations for performance. Using a detailed, architectural-level simulator, we evaluate full system performance using several diierent power/performance sensitive cache conngu-rations. We then use the information obtained from the simulator to calculate the energy consumption of the memory hierarchy of the system. We show that victim buuer ooers improved cache energy consumption over other techniques (10% compared to 3.8%), while at the same time provides comparable performance gains (3.54% compared to 3.45%). In this paper we will concentrate on reducing the energy demands of an ultra high-performance processor, such as the Pentium Pro or the Alpha 21264, which uses superscalar, speculative, out-of-order execution. In particular, we will investigate architectural-level solutions that achieve a power reduction in the memory subsystem of the processor without compromising performance. Prior research has been aimed at measuring and recommending optimal cache conngu-ration for power. In 9], the authors determined that high performance caches were also the lowest power consuming caches since they reduce the traac to the lower level of the memory system. The work by Kin 7] proposed accessing a small lter cache before access-ing the rst level cache to reduce the accesses (and energy consumption) from DL1. The idea lead to a large reduction in memory hierarchy energy consumption, but also resulted in a substantial reduction in processor performance. While this reduction in performance may be tolerable for some applications, the high-end market will not make such a sacriice. This paper will propose memory hierarchy conngurations that reduce power while retaining performance. Reducing cache misses due to line connicts has been shown to be eeective in improving overall system performance in high-performance processors. Techniques to reduce connicts include increasing cache associativity, use of victim caches 5], or cache bypassing with and without the aid of a buuer 4, 8, 10]. Figure 1 shows the design of the memory hierarchy when using a buuer alongside the rst level data cache. Also included in the

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Composite Pseudo Associative Cache with Victim Cache for Mobile Processors

Problem statement: Multi-core trends are becoming dominant, creating sophisticated and complicated cache structures. One of the easiest ways to design cache memory for increasing performance is to double the cache size. The big cache size is directly related to the area and power consumption. Especially in mobile processors, simple increase of the cache size may significantly affect its chip ar...

متن کامل

ضرب‌کننده و ضرب‌جمع‌کننده پیمانه 2n+1 برای پردازنده سیگنال دیجیتال

Nowadays, digital signal processors (DSPs) are appropriate choices for real-time image and video processing in embedded multimedia applications not only due to their superior signal processing performance, but also of the high levels of integration and very low-power consumption. Filtering which consists of multiple addition and multiplication operations, is one of the most fundamental operatio...

متن کامل

Improving Memory Access Performance Using a Code Coalescing Unit

High clock frequencies combined with deep pipelining employed by many of the state-of-the-art processors have forced cache hit accesses to be multi-cycle operations. For many programs, untolerated load latencies account for a signiicant portion of total execution time. In this paper, we present a mechanism called the Code Coalescing Unit (CCU) that can identify and eliminate at run-time several...

متن کامل

High speed Radix-4 Booth scheme in CNTFET technology for high performance parallel multipliers

A novel and robust scheme for radix-4 Booth scheme implemented in Carbon Nanotube Field-Effect Transistor (CNTFET) technology has been presented in this paper. The main advantage of the proposed scheme is its improved speed performance compared with previous designs. With the help of modifications applied to the encoder section using Pass Transistor Logic (PTL), the corresponding capacitances o...

متن کامل

Green Energy-aware task scheduling using the DVFS technique in Cloud Computing

Nowdays, energy consumption as a critical issue in distributed computing systems with high performance has become so green computing tries to energy consumption, carbon footprint and CO2 emissions in high performance computing systems (HPCs) such as clusters, Grid and Cloud that a large number of parallel. Reducing energy consumption for high end computing can bring various benefits such as red...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998